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Abstract 

The paper contributes to the rare literature modeling term structure of crude oil 
markets. We explain term structure of crude oil prices using dynamic Nelson- 
Siegel model, and propose to forecast them with the generalized regression frame¬ 
work based on neural networks. The newly proposed framework is empirically 
tested on 24 years of crude oil futures prices covering several important recessions 
and crisis periods. We find 1-month, 3-month, 6-month and 12-month-ahead fore¬ 
casts obtained from focused time-delay neural network to be significantly more 
accurate than forecasts from other benchmark models. The proposed forecasting 
strategy produces the lowest errors across all times to maturity. 
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1. Introduction 

Modeling and forecasting term structures of commodity markets is attrac¬ 
tive from academic perspective and valuable for producers, speculators, and risk 
managers. Generally, term structure illustrates expectations about future de¬ 
velopment of the corresponding market. Notwithstanding the high importance, 
there is almost no relevant literature forecasting commodity term structures. In 
this paper, we introduce a novel framework for forecasting term structure of 


*Support from the Czech Science Foundation under the P402/12/G097 DYME - “Dynamic 
Models in Economics” project is gratefully acknowledged. The research leading to these results 
has received funding from the European Union’s Seventh Eramework Programme (PP7/2007- 
2013) under grant agreement No. FP7-SSH- 612955 (FinMaP). 

Email address: barunik@utia.cas.cz (Jozef Barumk) 


Preprint submitted to Elsevier 


April 21, 2015 



crude oil futures prices. We propose to couple dynamic neural networks with 
Nelson-Siegel model to obtain precise forecasts of the crnde oil fntures prices. 

Crude oil is essential to world economies from the industrial perspective as it 
is vital input of production and its price is driven by distinct demand and supply 
shocks. Shifts in the price of oil are driven to different extents by aggregate 
or precautionary demand related to market anxieties about the availability of 
future oil supplies. As demand of crude oil, which is not dependent as mnch on 
price as on income (Hamilton, 2009), continues to rise and supply is probable to 
decline (due to nature of crude oil as limited resource), literature agrees about 
highly volatile and hence uncertain futnre development of crnde oil prices (Pan 
et ak, 2009). Main reasons for crude oil market being one of the most volatile 
in the world are rising demand and supply strongly dependent on behavior of 
politically and economically unstable countries, crude oil demand and production 
heavily correlated with occurrence of exogenous events snch as military conflicts 
and natural catastrophes, and presence of specnlators (Biiyiiksahin and Harris, 
2011 ). 

With crude oil futures market being one of the most developed markets ac¬ 
cording to the trading volumes, understanding the behavior of its term structure 
becomes even more important. Nevertheless, literatnre modeling and forecasting 
term structure of petroleum markets is rather scarce (see Lautier (2005) for re¬ 
view). Similarly to the interest rate models, there are two approaches of modeling 
term structure in petroleum commodities. Spot price being a natural candidate 
for state variable in one-factor model is modeled as geometric Brownian mo¬ 
tion (Brennan and Schwartz, 1985), or mean-reverting process (Schwartz, 1997). 
Later, researchers started to consider convenience yield as a second state variable 
in a two factor model (Schwartz, 1997). Alternatively, Gabillon (1991) employs 
long-term price as the second state variable. While both approaches assume con¬ 
stant interest rate, which implies that future spot price and forward prices are 
the same, Cortazar and Schwartz (2003) developed the three factor model. 

A relatively fresh new surge of literature explaining the commodity futures 
prices uses the approach of Diebold and Li (2006), originally introduced to model 
yield curves. Motivated by similarities of stylized facts between commodity mar¬ 
kets and interest rate markets, dynamic Nelson-Siegel model is a natural candi¬ 
date for this task. Among few, Karstanje et al. (2015) examine the comovement of 
factors driving commodity futures curves and their shapes by adopting the frame¬ 
work of the dynamic Nelson-Siegel model (Diebold and Li, 2006). Joint dynam¬ 
ics of factors driving commodity futures curves using multiple-regime framework 
is further studied by Nomikos and Pouliasis (2015). Almansour (2014) model 
the futures term structure of crude oil and natural gas markets with switching 
regimes, and Heidorn et al. (2015) regress futures curve factors extracted from dy¬ 
namic Nelson-Siegel model on fundamental and financial traders. While dynamic 
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Nelson-Siegel model explains the dynamics of factors underlying term structure 
of commodity prices, literature is silent about the future predictions with only 
exception of Grpnborg and Lunde (2015). In their original work, Diebold and 
Li (2006) propose to use a simple autoregressive time series models to success¬ 
fully forecast the dynamics of term structure factors, and hence prices in the 
interest rates market. We hypothesize, that factors in commodity markets may 
contain further nonlinear dependencies, which need to be modeled in order to 
obtain precise forecasts. Therefore, application of more general methods which 
do not require restrictive assumptions about the underlying structure of factors 
is appropriate. 

A natural candidate for the forecasting task are neural networks, which can be 
viewed as a generalized non-linear regression tool. Concisely, neural networks are 
semi-parametric non-linear models, which are able to approximate any reasonable 
function (Haykin, 2007; Hornik et al., 1989). Whereas the number of models using 
machine learning is rapidly growing in the academic literature, applications in 
energy markets are very limited. While several works use neural networks in 
energy forecasting (Fan et ah, 2008; Yu et ah, 2008; Xiong et ah, 2013; Jammazi 
and Aloui, 2012; Papadimitriou et ah, 2014; Barunik and Kfehlik, 2014), we are 
the first to employ the approach in forecasting of term structures. 

The contribution of this work is twofold. First, we enhance rare literature 
studying term structure of commodity prices with new results from the applica¬ 
tion of dynamic Nelson-Siegel modeling strategy on the crude oil futures markets 
for long period of 1990 - 2014. Second, we propose to use time-delay neural 
network to forecast the term structure factors identified by the dynamic Nelson- 
Siegel model. Using this framework, we forecast the term structure of crude oil 
futures prices successfully over the 1-month, 3-month, 6-month and 12-month 
forecasting horizons. 

2. Data 

2.1. Raw data 

The data set consists of monthly closing prices of West Texas Intermedi¬ 
ate (WTI) futures contracts.^ traded on the New York Mercantile Exchange 
(NYMEX). Each contract expires three trading days prior the 25th calendar day 
in the month preceding the month of delivery.^ In total, we analyze 396 monthly 
historical (already delivered) and to-date undelivered contracts - 12 contracts per 
each year with delivery months in period starting 1990. Undelivered contracts 


^Available at https : //www. quandl. com/c/futures/cme-wti-crude-oil-futures 
^Full specification of WTI futures contracts available on http://www.cmegroup.com/ 
trading/energy/files/en-153_wti_brochure_sr.pdf 
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Contract 

Date 

CLQ2003 
Settle r 

CLU2003 
Settle r 

CLV2003 

Settle r 

28.2.2001 

21,72 

625 

21,62 

646 

- 

- 

31.3.2001 

22,76 

602 

22,70 

623 

- 

- 

30.4.2001 

23,46 

582 

23,35 

603 

- 

- 

31.5.2001 

23,57 

559 

23,45 

580 

23,33 

603 


Table 1: Example of future prices and corresponding maturities for contracts traded between 
28.2.2001 and 31.5. 2001 for different contracts. CME product code CL is used for WTI futures 
contract, the letters Q, U, and V denote the delivery in August, September and October. 


represented in the dataset are contracts with delivery in November, December 
2014 and 24 contracts with delivery in two subsequent years 2015 and 2016. 

The main reason for using data starting from 1990 is that the maximum time 
to maturity for contracts before this date was up to nine months, while later 
during the period it increased to more than six years. Hence to avoid potentially 
large risk and inaccuracies stemming from data extrapolation, we consider only 
data after the year 1990. Choice of the monthly frequency is mainly driven by the 
fact that contracts with longer time to maturity were traded rather infrequently 
in the first half of the studied period. In addition, Baumeister et al. (2015) find 
monthly data to have equal predictive ability to daily data. 

Table 1 presents an example of actual data to illustrate the structure and 
dimension of the dataset. In order to associate each observation of futures price 
with corresponding time to maturity, it is necessary first to find exact expiry 
date of each contract. Then, the difference between expiry date and date of 
observation gives us remaining days to maturity. Table 1 captures end-of-month 
futures prices of three different (in this case consecutive) contracts with delivery 
in August, September and October 2003. For example, at the end of February 
2001, CLQ2003 and CLU2003 contracts were traded. On February, 28 2001 it 
was possible to enter into contract with delivery in August 2003 with futures 
price USD 21,72 per barrel. Respective time to maturity (r) was 625 trading 
days. 


Date 

30 

60 

Days to maturity (t 
90 120 150 

) 

180 

210 

28.2.2001 

27,48 

27,36 

26,99 

26,60 

26,21 

25,84 

25,48 ... 

31.3.2001 

26,50 

26,59 

26,43 

26,20 

25,94 

25,68 

25,43 ... 

30.4.2001 

28,74 

28,89 

28,53 

28,07 

27,60 

27,19 

26,78 ... 

31.5.2001 

28,49 

28,42 

28,14 

27,78 

27,38 

27,00 

26,59 ... 


Table 2: Example of reorganized data set to constant time to maturity 
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2.2. Reorganized data 

After combining the days to maturity with each observed quotation of futures 
price, the desired form of dataset is a matrix with number of rows equal to number 
of days included in analysis and number of columns equal to number of analyzed 
maturities. 

Time series captured in Table 2 are reorganized constant-maturity futures 
prices. WTI crude oil futures are delivered and expire with one-month regularity, 
therefore futures prices with exactly 30, 60, or 90 days to maturity are not traded 
every day. There are several ways in the literature to interpolate the prices to 
obtain desired form of the data. Diebold and Li (2006) use linear interpolation for 
constant maturity, while Holton (2003) prefers cubic splines interpolation.^ In our 
work, we follow the approach of Holton (2003), and use cubic spline interpolation. 
Figure 1 illustrates the reorganized constant-maturity futures prices we work 
with, plot against the daily evolution of the spot price. 



Figure 1: Reorganized data set: monthly term structures of crude oil futures prices plotted 
against daily spot prices for the period 1990 - 2014. 

Due to a long time span including several turbulent periods, we present term 
structure in separate periods to better highlight the rich dynamics (Figure 2). 

Figure 2 (a) illustrates the term structure dynamics in period 1990 to 2004. 
Prices are relatively steady with slight shift downwards during Asian crisis during 
1990s with a dramatic change after the year 2000 due to the energy crisis. Some 
authors attribute increase in futures prices to speculators and sudden shrinkage 


^For detailed discussion of interpolation methods for curve construction with applications on 
yield curve modeling see Hagan and West (2006). 

5 




(a) 


(b) 



Figure 2: Term structure of WTI futures prices for the period (a) 1990 - 2004 , (b) 2005 - 
2009, (c) 2008 - 2009, and (d) 2010 - 2014. Dates, days to maturity, and futures prices are on 
corresponding {x, y, zj axes. 


of oil reserve, while others disprove their arguments (Kilian and Murphy, 2014; 
Mahadeva et ah, 2013). Upcoming period started with steady dynamics and 
documented decent increase of futures prices across all maturities form the year 
2005 (Figure 2 (b)). The calm period has been interrupted by turbulent one 
around the year 2008, when crude oil prices exceeded USD 100 per barrel. 

Figure 2 (c) provides more detailed illustration of the rich dynamics during 
the period. Military conflicts in Nigeria (including oil pipelines attacks), tension 
between Iran and Israel and consequent fear of oil crisis accelerated rise of oil 
prices to unprecedented levels. Political unrest in the Middle East together with 
sharp depreciation of the U.S. Dollar resulted in further frequent and significant 
horizontal shifts in term structure. Global financial crisis returned WTI term 
structure back under USD 100 per barrel, and the data exhibit horizontal shift 
upwards at the end of 2009 driven by complicated political environment in the 
Middle East - conflicts in Gaza Strip. 

Increasing, decreasing or humped shapes of term structure can be observed 
during the most recent five-year period, as illustrated in Figure 2 (d). WTI term 
structure experienced strong upward horizontal shift during 2011 caused by po¬ 
litical unrest in Egypt or Libya together with the weak U.S. Dollar. Another 
steep shift upwards in 2012 had also political reason - danger of closing Strait of 

6 






















Hormuz by Iran as an answer to sanctions against Iran’s nuclear programme.^ Fi¬ 
nally, Greek bailout and Chinese economy stimulated by increased money supply 
contributed to rise of crude oil prices. 

2.3. Stylized facts about term structure 

Previous discussion documents many shapes of crude oil futures term struc¬ 
ture, which are essentially similar to yield curves of government bonds, although 
the data are fundamentally different. The similarities has been discussed in de¬ 
tail by (Grpnborg and Lunde, 2015), who compare the five stylized facts about 
government bonds yield curves (Diebold and Li, 2006) to the stylized facts about 
crude oil term structures. The discussion is important as we rely on the dynamic 
Nelson-Siegel approach (Diebold and Li, 2006) for modeling term structure. 

The main stylized facts about the yield curves are; (1) on average, the yield 
curve is increasing in time to maturity, and concave, (2) it exhibits various shapes 
through time - upward or downward sloping, humped, and inverted humped, (3) 
the “near” end of the yield curve is much more volatile than the “far” end, (4) 
yield dynamics are persistent, dynamics of spreads are much less persistent, and 
(5) long rates are more persistent than short rates. 

Term structure of crude oil is moreover vulnerable to political decisions and 
conflicts, hence its shape often changes not only in sense of horizontal shifts, but 
also in actual shape. To document its ability to exhibit wide variety of shapes 
we borrow the Figure 6 from Section 3.1.2, documenting four days with different 
shapes of the analyzed curve as illustrative examples. At the end of November 
1990, we can observe smooth decreasing term structure (Figure 6 (a)). In May 
1999 the curve does not show any smoothness and its behavior is unclear. Figure 
6 (c) shows nice increasing curve and the most recent example (Figure 6 (d)) 
proves also presence of humped curves in the data. 

Probably the most specific feature of crude oil future markets is backwar¬ 
dation.^ Hotelling (1931) postulates that equilibrium price of non-renewable re¬ 
sources like crude oil, which equals to net marginal revenue, increases over time at 
rate of interest. However, key differencing factor between Hotelling’s theory and 
theories of backwardation on crude oil market is uncertainty (Litzenberger and 
Rabinowitz, 1995). As argued by Haubrich et al. (2004), the opposite situation 
on the market - contango - should be present. Futures prices should be above 
spot prices of crude oil, as opportunity cost equal to interest rate and storage 
costs make crude oil stocks disadvantageous. Gonvenience yield justifies occur¬ 
rence of backwardation on commodity markets. Storing a commodity implies not 


Approximately 20% of worldwide traded crude oil passes through the Strait accord- 
iug to the U.S. Energy Informatiou Administration, see http://www.eia.gov/countries/ 
analysisbriefs/World_Qil_Transit_Chokepoints/wotc.pdf 

‘'Backwardation is a situation when future prices are lower than spot prices. 
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only costs but also benefits. Convenience yield can be understood as flow of 
services that accrues to an owner of the physical commodity but not to an owner 
of a contract for future delivery of the commodity ...” (Brennan and Schwartz, 
1985). The discounted marginal convenience yields to the the present value then 
equal backwardation appearing on the market, implying exogenously determined 
backwardation. One can introduce oil production as a call option to make it 
endogenous (Litzenberger and Rabinowitz, 1995). Alternative explanation was 
proposed by Lautier (2005), who points out analogy between convenience yield 
and coupons or dividends linked to bonds and stocks, respectively. 

3. Modeling the term structure 

As motivated by the previous analysis, crude oil term structure is similar to 
fixed income securities, hence the modeling vehicle can be shared.® The most 
successful approach used in the recent literature to model and forecast yield 
curves has been introduced by Diebold and Li (2006). The model is a dynamic 
representation of Nelson-Siegel model (Nelson and Siegel, 1987), and has been 
recently used in the crude oil markets successfully by Grpnborg and Lunde (2015). 
Contrary to affine general equilibrium models, which assume concrete functional 
relationship for yield curve, this class of models does not stem from any theoretical 
grounds and is based only on parametrization of curve shapes. Generally, models 
of curve htting using standard statistical methods perform better in curve fitting 
and forecasting compared to affine models (Steeley, 2008). 

3.1. Dynamic Nelson-Siegel model 

For the modeling of term structure of crude oil futures prices, we use the dy¬ 
namic Nelson-Siegel model (Diebold and Li, 2006). Choice of this framework is 
motivated by several aspects. First, other classes of models such as no-arbitrage 
or affine general equilibrium models fail in forecasting. As Sarker et al. (2006) 
points out, no-arbitrage models focus on cross-section fitting of yield curve at 
particular point in time, which implies lack of capturing yield curve dynamics by 
the model. Affine models capture time-series dynamics, but omit proper cross- 
sectional fit at given time. Second, functional specification of yield curves pro¬ 
vided by Nelson and Siegel (1987) is able to model diverse shapes observable on 
markets. Third, the model provides intuitive parameters, which are straightfor¬ 
ward to explain and interpret. Further, Bliss (1996) has shown that Nelson-Siegel 
model outperforms other methods in yield curve estimation, and Diebold and Li 


® There are simplifying assumptions for the crude oil term structure models - there are no 
frictions, taxes, or transaction costs on the market, trading is continuons, lending and borrowing 
rates are equal, short sale is unconstrained and markets are complete (Lantier, 2005). 



(2006) show Nelson-Siegel model to be able to replicate stylized facts about yield 
curves. On the contrary, Duffie and Kan (1996) concluded that yield curves es¬ 
timated by affine general equilibrium models, such as Vasicek or CIR, do not 
conform the behavior observed on markets. 

Diebold and Li (2006) propose to forecast the yield curve using time series of 
three yield curve components formulated in Nelson-Siegel model. In this frame¬ 
work, the dynamics of the term structure of crude oil futures prices is described 
by 

( 1 — /I — \ 

j -( 1 ) 

where ptij) is price of crude oil futures at time t = 1,..., T with time to maturity 
r = 30, 60, 90,..., 720, and /3ot, /3it, and (32t are interpreted as coefficients on level, 
slope and curvature factors, respectively. Level factor is long-term component 
as the values of the factor are constant over whole period and maturities. Slope 
factor is short-term component, as long as it decays exponentially at rate Xt- 
Finally, curvature factor is referred to as medium-term component, as it increases 
for medium-term maturities and then decays for the longest maturities. 

1.2 
1.0 
0.8 
0.6 
0.4 
0.2 
0.0 


Figure 3: Loadings of Nelson-Siegel latent factors of term structure 

Figure 3 presents estimated loadings of the factors as a function of time to 
maturity. The plot uses a fixed decay Xt = X = 0.0058 found empirically in the 
next section. 

Level factor on the /3ot is constant for all the maturities, hence impacts fu¬ 
tures prices for all maturities evenly. Change to level factor means horizontal 
shift of term structure, and thus will affect prices at all maturities in the same 
way. Loading on the slope factor is decreasing from one (zero time to maturity) 
to zero with maturity going to infinity. Note that Figure 3 plots maturities start¬ 
ing from 30 days. Compared to the curvature factor, slope factor is higher for 
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Figure 4: Time series of At 


shorter maturities which confirms to be rather short-term factor, i.e. affecting 
prices associated with shorter maturities more. On the contrary, curvature factor 
converges to zero with time to maturity approaching zero, and infinity, while /32t 
has highest loadings for medium maturities with maximum at time to maturity 
equal to 1/A. 

3.1.1. Decay parameter 

The most important element in Nelson-Siegel class of models is parameter At 
determining exponential decay. Low values of the parameter imply slower decay 
of the resulting curve and vice versa. Empirically, choice of At value represents a 
trade-off between fitting close and far ends of term structure. Higher values of the 
parameter result in better fit of the functional form in the case of short maturities. 
Conversely, lower values improve fit for the longer maturities (Diebold and Li, 
2006). Decay parameter also defines maturity where loading on the medium term 
curvature factor fl 2 t is maximized. 

In addition. A* handling governs actual nature of above defined relationship. 
If we allow for dynamically evolving At over time, we obtain nonlinear problem, 
which is computationally much more demanding. While authors in the yield 
curve literature often consider 2-years or 3 -years time to maturity as medium- 
term maturity, and use this assumption to fix At = A for all times t = 1,... ,T, 
it is infeasible in case of crude oil futures. Literature on crude oil term structure 
does not provide any well reasoned suggestions about medium-term maturities on 
oil markets, and there is almost no reference for proper choice of A, as modeling 
term structure of crude oil markets using Nelson-Siegel family models is not fully 
explored in the literature. 

A different approach employs nonlinear least squares estimation of all four 
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parameters in Equation 1, i.e. /?ot, /3it, /32t and Xt for all t. The main problem of 
such an approach is, that At may be unstable due to unexpected jumps. While the 
model will fit the data very well, its predictive power deteriorates (Vela, 2013). 

We find the optimal values of At by minimizing sum of squared errors of 
Nelson-Siegel approximations of WTI futures term structure for each observed 
point in time. Figure 4 illustrates the estimates. To ease the optimization, we 
restrict the values to correspond maturity between 0 and 1000 days. While A 
determines reciprocal value to number of days to maturity where medium-term 
(i.e. curvature) factor is maximized, search for optimal 1/Xt outside this interval 
is superfluous. 

We can observe that At is unstable for the crude oil futures data showing no 
clear pattern. Consequently, allowing for dynamic At makes successful predictions 
hardly possible. Therefore, we find single optimal value of A by minimizing sum 
of squared errors of Nelson-Siegel approximation of WTI term structure over the 
whole period as 


289 24 

A* = argmin ^ ^ {Ptin) - pt{Ti; Pot, Pu, P 2 t, X) f (2) 

Ase 

where 289 is total number of observed points in time and 24 is number of analyzed 
constant maturities (from 30 to 720 days). Resulting value of A* = 0.0058, 
implying reciprocal value of 1/A* equal to 173.4551 yielding acceptable value 
of medium term maturity^ Result of the optimization is in line with reviewed 
literature. Grpnborg and Lunde (2015), who analyzed oil futures (although in 
different period) arrived to A equal to 0.005. 


3.1.2. Level, slope, and curvature estimates 

Having set the optimal value of A*, we proceed with in-sample estimation of 
the set of Pt coefficients on latent factors. For all times t, the parameters are 
obtained from ordinary least squares (OLS) fit across maturities 


min 

/3o,/3i,/32 


24 

E 

2 = 1 


Ptin) - Po- Pi 




where ptin) is WTI futures price at time t with time to maturity r*. This 
procedure results in obtaining time series of three /3-coefficients, with length of 
289 values. 


^The maximum observed time to maturity in our period reached less than 2000 days = 
approx. 6 years, which is much less compared to 30 years in case of U.S. yield curve. In such a 
case authors claim 2-3 years to be medium maturity. 
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Figure 5: Estimated coefficients from dynamic Nelson-Siegel model of crnde oil futures for the 
period of 1990 - 2014. Level - /lot, slope - /lit, and curvature - p 2 t- 


Estimates of Pt coefficients are plotted in Figure 5. At first glance, behavior 
of jpQt - the level coefficient - attracts attention. Increasing level coefficient over 
whole observed period corresponds to general increase of crude oil prices. Slope 
and curvature coefficients seem to be in general more stable. Slope factor fluctu¬ 
ates around zero in the first part of the sample, while it becomes positive until 
2008, meaning that resulting term structure is downward sloping. After 2008, 
slope coefficient jumps to large negative values and remains negative for follow¬ 
ing two years implying upward sloping term structure. Most recent period from 
2011 is characterized by positive values which implies decreasing term structure. 
Diebold and Li (2006) propose to forecast the factor loadings using autoregressive 
and vector-autoregressive models, with random walk as a benchmark. One of the 
directly visible features of the factor loadings is its non-stationarity. Stationarity 
is rejected for the level factor, and the two remaining factors are at the boarder. 
Whereas this makes further autoregressive analysis problematic, it is part of the 
motivation for the usage of neural networks, which does not need to assume sta¬ 
tionary time series. In addition, factors may contain nonlinearities, which are not 
captured by simple linear time series analysis. 

Before we turn to the main part of the analysis, forecasting, we illustrate the 
fit of dynamic Nelson-Siegel model on the crude oil futures in Figure 6, Term 
structures are generally fitted with high degree of accuracy for all curve shapes. 
Similarly to Diebold and Li (2006), in case of term structure with multiple local 
extremes (as during May 1999), the approximation is not so accurate. 
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(a) November 30, 1990 


17.4 


(b) May 31, 1999 





(d) March 31,2012 



Days to Maturity 


Figure 6: Examples of term structures of futures contracts on crude oil fitted by Dynamic 
Nelson-Siegel model: (a) November 30, 1990, (b) May 31, 1999, (c) December 31, 2008 and (d) 
March 31, 2012. 


4. Forecasting the term structure with neural networks 


To obtain the future term structure forecasts from dynamic Nelson-Siegel 
model, Diebold and Li (2006) propose to forecast individual Pt coefficients using 
linear autoregressive (AR) and Vector AR (VAR) models. In this work, we pro¬ 
pose to forecast the individual coefficients on factor loadings using artificial neural 
networks. The motivation is straightforward, as /3t coefficients are not stationary 
for the crude oil futures, and may further contain nonlinear dependence. Linear 
models are not able to capture these features well, hence we hypothesize that our 
proposed approach will yield more accurate forecasts. Similarly to Diebold and 
Li (2006), forecast of futures price with forecast horizon h will be calculated as 


Pt+h{T) 


fio,t+h + Pl,t+h 


1 — e 


— X*r 


X*T 


+ h,t+h 



( 4 ) 


where I3i^t+h are coefficients to be predicted. Both AR and VAR models used for 
prediction by Diebold and Li (2006) are developed to capture linear features of 
the time-series. Hence using them for forecasting coefficients on factor loadings, 
one assumes that they are generated by a linear processes. This is not the case of 
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Artificial neural networks (ANN), as ANNs do not require any assumptions about 
statistical properties of underlying series for their proper application. ANNs may 
be viewed as a generalization of these classical approaches, which allows us to 
model different type of nonlinearities in the data. 

Although neural networks imitating neural processing in brain activation, are 
primarily associated with biological systems and successfully applied in numerous 
fields, such as pattern recognition, medical diagnostics, many econometricians 
argue that the approach is a black box. Together with the fact that one must make 
arbitrary decisions about the implementation of the network, i.e., the number of 
hidden layers, the choice of transformation functions, the number of neurons, etc., 
neural networks are still not commonly used for financial time series modeling, 
and we are pioneering their use in the term structure forecasting. 

Abandoning these concerns, we use neural network as a generalized nonlinear 
regression, being able to describe the complex patterns in time series of curvature 
parameters. Like other linear or nonlinear methods, a neural network relates a 
set of input variables, say lags of time series, to output - in our case the forecast. 
The only difference between network and other models is that the approximating 
function uses one or more so-called hidden layers, in which the input variables 
are squashed or transformed by a special function. 

The most widely used artificial neural network in financial applications with 
one hidden layer (Hornik et ah, 1989) is the feed-forward neural network. The 
general feed-forward or multi-layered perceptron (MLP) network we use for fore¬ 
casting of I3t+h coefficient may be described by the following equations: 

k* 

Pt+h = 70 + y~l7fcA(nfc,f) 

fc=i 

= l+e-nM 
m+l 

i=0 

with k* neurons Uk^t, and ujk,i representing a coefficient vector or weights vector to 
be found. The variable rik^t consisting of m -|- 1 lags of time series being forecast, 
is squashed by the hyperbolic tangent transfer function and becomes a neuron 
■^(n-fc,t). Next, the set of k* neurons are combined linearly with the vector of 
coefficients {'Jk}k*=i form the final output, which is the forecast of the f3t+h 
coefficient on factor loadings from the dynamic Nelson-Siegel model. The general 
feed-forward network is the workhorse of the neural network modeling approach 
in finance industry, as almost all researchers begin with this network as the first 
alternative to linear models. 


( 5 ) 

( 6 ) 
( 7 ) 
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Note that AR is a simple special case within this framework if transformation 
is skipped (i.e. A(nfc^i) = rit^k) and one neuron that contains a linear 
approximation function is used. Therefore, in addition to classical linear models, 
there are neurons that process the inputs to improve the predictions. 

To be able to approximate the target function, the neural network must be 
able to “learn”. The process of learning is defined as the adjustment of weights 
using a learning algorithm. The main goal of the learning process is to minimize 
the sum of the prediction errors for all training examples. The training phase 
is thus an unconstrained nonlinear optimization problem, where the goal is to 
find the optimal set of weights of the parameters by solving the minimization 
problem: 

min{T(w) ; oj G M'^}, (8) 

where T : —)■ M” is a continuously differentiable error function. There are 

several ways of minimizing but basically we are searching for the gradient 

G = of function T, which is the vector of the first partial derivatives 

of the error function with respect to the weight vector lo. Furthermore, 

the gradient specifies a direction that produces the steepest increase in T. The 
negative of this vector thus provides us the direction of steepest decrease. 

Nevertheless, the traditional gradient descent algorithms often fail in learning 
intricate patterns in the data efficiently due to many possible initial settings. One 
of the efficient methods for learning the patterns in feed-forward neural networks, 
which we use, is the Levenberg-Marquardt back-propagation. 

4 . 1 . Focused time-delay neural network 

To be able to fully explore the time dependence in time series, we use a simple 
extension of the feed-forward framework, as dynamic neural networks are capable 
to learn dynamics of time series relationships more effectively. Time-Delay Neural 
Network is a feed-forward network with a tapped delay line at the input. It is 
similar to a multilayer perceptron as all connections feed forward. In addition, 
the inputs to any node consist of the outputs of earlier nodes from previous time 
steps. This is generally implemented using tap-delay lines. 

Most straightforward general dynamic neural networks is the class, which 
have delays only on the input units known as Focused Time-Delay Neural Network 
(Clouse et ah, 1997). It consists of set of feed-forward networks with tapped delay 
line capturing autoregressive property of inspected series. We propose to use the 
Focused Time-Delay Neural Network (FTDNN) for forecasting of jdt loadings. 
The delay A is introduced to the Equation 7 as 

m+l 

^k,t ^k,0 T ^ ^ ^k,iPt—{i—l)A (9) 

i=0 
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In order to forecast three time series of coefficients estimated by Nelson- 
Siegel model, we will naturally use three separate networks. To prevent over¬ 
fitting, we use cross-validation over time with fixed window. The best model 
is always chosen based on the cross-validation scheme. In-sample (training and 
validation) and out-of-sample (testing) datasets are chosen in usual ratio of 60%, 
20%, and 20% for training, validation, and testing respectively. In terms of the 
out-of-sample forecast period, we start to forecast the futures prices from 2010. 
The same period is also used for forecasts from competing models defined in next 
section. 

Input layer consists of m lags relevant for forecast, where m can be determined 
by inspecting respective sample autocorrelation function. In order to retain com¬ 
parability of forecast results with AR(1) and VAR(l) models, we use one lag. A 
simple network with one hidden layer consisting of up to 20 hidden neurons is 
considered. Output neuron is /-th step-ahead forecast of particular j3t coefficient; 
1-month, 3-month, 6-,month and 12-month-ahead forecasts have been examined. 
Final decision about network structure was made according to Hannan-Quinn in¬ 
formation criterion®, as it punishes networks with excess number of parameters. 

5. Out-of-sample forecasting performance 
5.1. Competing models 

The main interest of this work is in assessing the out-of-sample forecasting 
performance of neural networks in forecasting term structure of crude oil futures. 
Naturally, we asses the performance in relative terms to a competing models 
used by the literature. The first competing model we consider is a simple AR(1) 
process for all three I3i^t+h coefficients i = {1, 2, 3}: 

Pi,t+h Cj T 'YiPit) 

where coefficients Ci and % are obtained by regressing on fdi^t-h and an inter¬ 
cept. Factor loadings (3i^t+h may generally contain unit root, which will result in 
poor forecasts due to large possible biases in estimates. Still, the model is used 
in the literature modeling yield curves and term structures. 

A second benchmark model for forecasting term structure we consider is vector 
autoregressive model, where 

f^t+h = c + r/3t, (11) 


^HQIC = 


In 



N 


I fc(ln(ln(iV))) 
N 
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with c, and T holding coefficients to be estimated. In case of autoregressive 
model, issues implied by potential unit root presence in one of the series are not 
so severe. However, unrestricted VAR models perform quite poorly in forecasting 
tasks. Poor performance is caused mainly by danger of over-parametrization due 
to large number of parameters. Diebold and Li (2006) also note that factors do 
not share cross-correlation structure, hence we should not expect VAR(l) model 
to produce superior forecasts. In case of term structures, the situation is different, 
as the coefficients share interaction to be modeled. 

As a final, benchmark model, we consider Random Walk, where the expected 
forecast is the previous lag 

di,t+h=Pit. ( 12 ) 

All four models are used to forecast the term structure of crude oil futures, 
both in one-step-ahead and multi-step-ahead predictions (we consider l-,3-,6-, 
and 12-months-ahead). 


5.2. Evaluation of forecasts 

To statistically compare the accuracy of the forecasts from different mod¬ 
els, we employ two common loss functions, namely the root mean square error 
(RMSE) and the mean absolute error (MAE). The measures are calculated for 
the t = 1,... ,T forecasts as 


RMSE -- 

MAE 




N 


1 

N 


J^ipt+i-pt+if 

2=1 

(13) 

T 

J2\pt+i - pt+i\ 

(14) 


i=l 

As discussed by Nomikos and Pouliasis (2011), these metrics do not provide 
information about the asymmetry of the errors. While asymmetric errors are 
commonly found by the volatility literature, it may be also of interest to see 
if the models do not over-, or under-predict the term structures systematically. 
Eor example Nomikos and Pouliasis (2011); Wang and Wu (2012); Barunik and 
Kfehlfk (2014) find majority of forecasting models to over-predict the volatility 
on petroleum markets. The bias then translates to direct economic losses. Hence, 
as suggested by Nomikos and Pouliasis (2011), we employ two additional mean 
mixed error (MME) loss functions (Brailsford and Faff, 1996) to assess the fore¬ 
casts. These functions use a mixture of positive and negative forecast errors with 
different weights allowing us to discover the cases if the model tends to over- or 
under-predict 


1 

N 


V\pt+i-pt+i 

\i£U ieo 
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MME{0) 
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MME{U) = ^ f X] V\Pt+i-Pt+i \ + X] \Pt+i - Pt+i \] > (16) 

\i£U i&O / 

where U is the set containing under-predictions and O is the set containing over¬ 
predictions. 

To test significant differences of loss functions from competing models, we use 
the Model Confidence Set (MCS) methodology of Hansen et al. (2011). Given a 
set of forecasting models, Mo-, we identify the model confidence set M*i_a C Mo, 
which is the set of models that contain the “best” forecasting model given a level 
of confidence a. For a given model i £ Mo, the p-value is the threshold confidence 
level. Model i belongs to the MCS only if pi > a. MSC methodology repeatedly 
tests the null hypothesis of equal forecasting accuracy 

: E\Li^t - = 0, for all i, j £ M 

with being an appropriate loss function of the i-th model. Starting with the 
full set of models, M = Mo, this procedure sequentially eliminates the worst¬ 
performing model from M when the null is rejected. The surviving set of models 
then belong to the model confidence set Ml_^. Following Hansen et al. (2011), 
we implement the MCS using a stationary bootstrap with an average block length 
of 20 days.® 

5.3. Discussion of the results 

Four forecasting models - focused time-delay neural network (FTDNN), AR(1), 
VAR(l) and random walk (RW) ~ are used to forecast the term structure of crude 
oil futures, both in one-step-ahead and multi-step-ahead predictions. We begin 
with discussion of aggregate results. Average RMSE of forecasts over all matu¬ 
rities in Table 3 reveals that FTDNN produces forecasts with the lowest errors 
for all forecasting horizons considered. Second-best forecasting model is AR(1) 
model, confirming conclusions of Diebold and Li (2006) on the yield curves data 
who find AR(1) model to outperform both VAR(l) and RW. 

While average results provide us with the first notion of how the models 
perform against each other, Tables 4 and 5 in Appendix A provide summary of 
forecast performance for individual maturities. For better clarity of the results, 
we report RMSE and MAE relative to the respective statistics from RW as a 
benchmark model. A simple ratio tells us quickly, how the model under evaluation 
compares to the benchmark Random Walk. Moreover, the Model Confidence Set 
is found across all models for all time to maturities and multi-step-ahead forecasts. 

®We have used different block lengths, including the ones depending on the forecasting hori¬ 
zons, to assess the robustness of the results, without any change in the final results. These 
results are available from the authors upon request. 
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Horizon 

FTDNN 

AR(1) 

VAR(l) 

RW 

1 month 

4,398 

4,708 

4,971 

4,772 

3 months 

6,077 

7,572 

8,060 

7,952 

6 months 

6,425 

8,868 

10,362 

10,140 

12 months 

7,881 

7,947 

11,487 

9,841 


Table 3: Average RMSE across all constant maturities. 


In case of one-month-ahead forecast, FTDNN yields lowest RMSE and MAE 
in comparison to the rest of the models. ETDNN is the only model in the Model 
Confidence Set for maturities lower than 630 days according to RMSE. Eor longer 
maturities considered, specifically 660, 690, and 720, AR(1), and RW belong to 
the Model Confidence Set, while VAR(l) is rejected all the times. Looking at 
MAE, the situation is very similar with only difference, that for maturities longer 
than 420, all ETDNN, AR(1), and RW models produce statistically indistinguish¬ 
able forecasts, while ETDNN produces the lowest average statistics. 

The difference between ETDNN and all the other models is even more pro¬ 
nounced when forecasting 3-months-ahead, where the forecasts from ETDNN are 
the only forecasts which are included as best forecasts using MCS for all times 
to maturity. This means that ETDNN decisively produces significantly better 
forecasts than all other models at all maturities. 

Longer forecasts for 6-months-ahead show that ETDNN produces even larger 
improvements in terms of RMSE and MAE in shorter horizons, where it is the 
only model belonging to the MCS. The longer the horizon, the lower the gains 
from the ETDNN against all other models are. While ETDNN produces the 
lowest average RMSE and MAE, none of the models can be rejected from the 
Model Confidence Set for maturities larger than 300 days. This means that all 
models produce statistically similar 6-months-ahead forecast for longer horizons. 

The longest horizon forecasts of one year show similar results to the 6-month 
forecast, with VAR(l) and RW being rejected from Model Confidence Set for all 
maturities. Eor the short maturities, the ETDNN produces the best forecasts, 
while for longer maturities, AR(1) is included in the MCS as well. 

Summarizing the results from RMSE and MAE, we can see that ETDNN 
produces the forecasts with significantly lowest errors in comparison to other 
competing models for short maturities and short forecasting horizon. Eor longer 
maturities than 300, and longer forecasting horizon, other models play role. Of¬ 
ten, forecasts from AR(1) model can not be statistically distinguished from the 
forecasts from ETDNN. We need to note here that ETDNN includes only one 
delayed input to make the model comparable to AR(1) and VAR(l) strategies 
used by the literature, and the forecasts will even improve with increasing num¬ 
ber of lags in the ETDNN. While we have experimented with number of lags. 
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and obtained even lower errors, the sample size of the data does not allow us to 
rigorously study these models, and we leave it for future research. 

To see if the models do not over-, or under-predict the term structures, we 
employ the MME(U) and MME(O) statistics. 

Table 6 shows the average number of cases when the error from the model 
is negative for all models across forecasting horizons, and maturities. Table 7 
shows the average number of cases when the error from the model is positive. 
In addition, asymmetric errors are tested using MME(U), and MME(O) in the 
MCS framework. In short, Table 6 shows if the models tends to under-predict 
the term structures, while Table 7 tends to over-predict the term structures. 

The important observation form the asymmetric loss functions is that models 
in general produce symmetric forecasts in the short term forecasts, and short 
times to maturities. With longer time to maturities, the ETDNN tends to under- 
predict at 1-month and 12-month-ahead forecasts, while over-predict at 3-month, 
and 6-month-ahead forecasts. AR(1) tends to generally under-predict at all fore¬ 
casting horizons. Eor the longest forecasting horizon of 1 year, and longer time 
to maturities, AR(1) largely over-predicts the futures prices. The results reveal 
the similar pattern in terms of forecast comparisons. ETDNN is never rejected 
from the Model Confidence Set. 

6. Conclusion 

This paper investigates the properties of crude oil markets term structure, 
and propose dynamic neural networks for their forecasting. 

The term structure of crude oil futures prices exhibits very similar behavior 
to government bonds yield curve, and three-factor dynamic Nelson-Siegel model 
(Diebold and Li, 2006) used by the literature for modeling yield curves captures 
the shapes of the term structure very well. We further forecast the factors using 
dynamic neural network. 

Proposed framework yields significant improvements in the futures prices fore¬ 
casts when compared to other benchmark models. We show the performance on 
the 1-month, 3-month, 6-month and 12-month forecasting horizons. Eorecasting 
errors from our approach have moreover traceable patterns. Eor fixed forecasting 
horizon, the deviation between forecast and observed futures price decreases as 
time to maturity increases. Eurthermore, for more distant forecast horizons the 
deviation on average expectedly increases. 

In summary, this work has shown that crude oil term structure can be suc¬ 
cessfully modeled and predicted by parsimonious Nelson-Siegel model primarily 
developed for interest rates coupled with generalized regression framework of 
neural networks. The future research will show if our results hold for other com¬ 
modities as well. An interesting and important approach would also be to use 
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the framework to study the commonalities between factors across various com¬ 
modities. 
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